{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![LOGO](../tutorial/tutorial_notebooks/img/MODIN_ver2_hrz.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## modin.spreadsheet\n",
    "`modin.spreadsheet` is a Jupyter notebook widget that allows users to interact with Modin DataFrames in a spreadsheet-like fashion while taking advantage of the underlying capabilities of Modin. The widget makes it quick and easy to explore, sort, filter, edit data and export reproducible code. \n",
    "\n",
    "This tutorial will showcase how to use `modin.spreadsheet`. Before starting, please install the required packages using `pip install -r requirements.txt` in the current directory. Then just run the cells; no editing required!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Please install the required packages using `pip install -r requirements.txt` in the current directory\n",
    "# For all ways to install Modin see official documentation at:\n",
    "# https://modin.readthedocs.io/en/latest/installation.html\n",
    "import modin.pandas as pd\n",
    "import modin.spreadsheet as mss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a Modin DataFrame\n",
    "The following cells creates a DataFrame using a NYC taxi dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "columns_names = [\n",
    "        \"trip_id\", \"vendor_id\", \"pickup_datetime\", \"dropoff_datetime\", \"store_and_fwd_flag\",\n",
    "        \"rate_code_id\", \"pickup_longitude\", \"pickup_latitude\", \"dropoff_longitude\", \"dropoff_latitude\",\n",
    "        \"passenger_count\", \"trip_distance\", \"fare_amount\", \"extra\", \"mta_tax\", \"tip_amount\",\n",
    "        \"tolls_amount\", \"ehail_fee\", \"improvement_surcharge\", \"total_amount\", \"payment_type\",\n",
    "        \"trip_type\", \"pickup\", \"dropoff\", \"cab_type\", \"precipitation\", \"snow_depth\", \"snowfall\",\n",
    "        \"max_temperature\", \"min_temperature\", \"average_wind_speed\", \"pickup_nyct2010_gid\",\n",
    "        \"pickup_ctlabel\", \"pickup_borocode\", \"pickup_boroname\", \"pickup_ct2010\",\n",
    "        \"pickup_boroct2010\", \"pickup_cdeligibil\", \"pickup_ntacode\", \"pickup_ntaname\", \"pickup_puma\",\n",
    "        \"dropoff_nyct2010_gid\", \"dropoff_ctlabel\", \"dropoff_borocode\", \"dropoff_boroname\",\n",
    "        \"dropoff_ct2010\", \"dropoff_boroct2010\", \"dropoff_cdeligibil\", \"dropoff_ntacode\",\n",
    "        \"dropoff_ntaname\", \"dropoff_puma\",\n",
    "    ]\n",
    "parse_dates=[\"pickup_datetime\", \"dropoff_datetime\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('s3://modin-datasets/trips_data.csv', names=columns_names,\n",
    "                header=None, parse_dates=parse_dates)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate a spreadsheet widget with the DataFrame\n",
    "`mss.from_dataframe` takes in a DataFrame, optional configuration options, and returns a `SpreadsheetWidget`, which contains all the logic for displaying the spreadsheet view of the DataFrame. The object returned will not be rendered unless displayed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spreadsheet = mss.from_dataframe(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Displaying the Spreadsheet\n",
    "The widget is displayed when the widget is returned by an input cell or passed to the `display` function e.g. `display(spreadsheet)`. When displayed, the SpreadsheetWidget will generate a transformation history cell that contains a record of the transformations applied to the DataFrame unless the cell already exists or the feature is disabled.\n",
    "\n",
    "### Basic Usage\n",
    "`from_dataframe` creates a copy of the input DataFrame, so changes do not alter the original DataFrame.\n",
    "\n",
    "**Filter** - Each column can be filtered according to its datatype using the filter button to the right of the column header. Any number of columns can be filtered simultaneously.\\\n",
    "**Sort** - Each column can be sorted by clicking on the column header. Assumptions on the order of the data should only be made according to the latest sort i.e. the 2nd last sort may not be in order even if grouped by the duplicates in the last sorted column.\\\n",
    "**Cell Edit** - Double click on a cell to edit its value.\\\n",
    "**Add Row**(toolbar) - Click on the `Add Row` button in the toolbar to duplicate the last row in the DataFrame.\\\n",
    "**Remove Row**(toolbar) - Select row(s) on the spreadsheet and click the `Remove Row` button in the toolbar to remove them.\\\n",
    "**Reset Filters**(toolbar) - Click on the `Reset Filters` button in the toolbar to remove all filters on the data.\\\n",
    "**Reset Sort**(toolbar) - Click on the `Reset Sort` button in the toolbar to remove any sorting on the data.\n",
    "\n",
    "### Transformation History and Reproducible Code\n",
    "The widget records the history of transformations, such as filtering, that occur on the spreadsheet. These transformations are updated in the `spreadsheet transformation history` cell as they happen and can be easily copied for reproducibility. The history can be cleared using the `Clear History` button in the toolbar.\n",
    "\n",
    "**Try making some changes to the spreadsheet!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "spreadsheet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exporting Changes\n",
    "`to_dataframe` takes in a `SpreadsheetWidget` and returns a copy of the DataFrame reflecting the current state of the UI on the widget. Specifically, any filters, edits, or sorts will be applied on the returned Dataframe.\n",
    "\n",
    "**Export a DataFrame after making some changes on the spreadsheet UI**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "changed_df = mss.to_dataframe(spreadsheet)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "changed_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SpreadsheetWidget API\n",
    "The API on `SpreadsheetWidget` allows users to replicate some of the functionality on the GUI, but also provides other functionality such as applying the transformation history on another DataFrame or getting the DataFrame that matches the spreadsheet state like `to_dataframe`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Duplicates the `Reset Filters` button\n",
    "spreadsheet.reset_filters()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Duplicates the `Reset Sort` button\n",
    "spreadsheet.reset_sort()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "qgrid6f69f373-ae0e-423e-8e26-429f52e1669d": true
   },
   "outputs": [],
   "source": [
    "# Duplicates the `Clear History` button\n",
    "spreadsheet.clear_history()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Gets the modified DataFrame that matches the changes to the spreadsheet\n",
    "# This is the same functionality as `mss.to_dataframe`\n",
    "spreadsheet.get_changed_df()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retrieving and Applying Transformation History \n",
    "The transformation history can be retrieved as a list of code snippets using the `get_history` API. The `apply_history` API will apply the transformations on the input DataFrame and return the resultant DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "spreadsheet.get_history()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "another_df = df.copy()\n",
    "spreadsheet.apply_history(another_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional Example\n",
    "Here is another example of how to use `from_dataframe` with configuration options."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mss.from_dataframe(df, show_toolbar=False, grid_options={'forceFitColumns': False, 'editable': False, 'highlightSelectedCell': True})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
